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Foreword 





XV EXPORTERS play an essential role in communicating 
science to the public. In common with scientists, they desire 
accuracy. Although health and medicine provide many exciting 
stories, the biostatistics that scientists must use in their studies 
presents special problems for reporters. It gives uncommon and 
misleading meanings to common words like “significant," “con¬ 
sistent,* and “power" Mathematical statistics often produces re¬ 
sults that are disturbingly counterintuitive, at least at first, to 
laymen and scientists alike. In vital statistics and epidemiology, 
definitions often seem arbitrary, and slight changes make con¬ 
siderable differences in the findings. 

Science writers often take short courses in special topics 
such as biostatistics. I have taught in some of these courses and 
have been impressed by the seriousness of the participants. Nev¬ 
ertheless, they need some of this material in an accessible and 
permanent form. 

Victor Cohn of the Washington Post has prepared this man¬ 
ual to help all reporters cut through these statistical tangles. He 
wants to give them a guide to the ways that statistics can clarify 
facts or mystify the reader. 

Cohn’s book grew out of the Media Project of our Health 
Science Policy Working Group of the Division of Health Policy 
Research and Education at Harvard University. I am pleased 
that faculty members of the Harvard School of Public Health 
have been able to help him produce this book as a visiting fellow 
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FOREWORD 


in 1978 and 1984 and as a contributor to the Health Science 
Policy Working Group. 

Through the Media Project, with the help of Jay Winsten, 
we have also examined sources of pressures on the science 
writer. 1 In the future we want to use what we have learned 
through many discussions with science writers to advise scien¬ 
tists on their role in the media. 

By such efforts, including this book, and by many similar 
efforts in this and other fields, scientists and writers may gradu¬ 
ally upgrade the whole communication system, scientific and 
journalistic. Thus we may dear the communication channel 
between science and the public. 

Frederick Mosteller 
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FACTS AND FIC 


Science is observation, experimentation, measurement, and all 
these involve numbers, whether we reporters pay attention to 
them or not. 

Statistics are used or misused even by people who tell us, “I 
don’t believe in statistics* then claim that all of us or most people 
or many do such and such. The question for reporters is, how 
should we not merely repeat such numbers, stated or implied, 
but also interpret them to deliver the best possible picture of 
reality? 

We can be better reporters if we understand how the best 
statisticians—the best figurers—figure. And if we learn a few 
questions to help us separate the wheat from the chaff. 

I do not say that telling the truth—describing reality—will 
then become easy, for we are constantly bombarded with sweep¬ 
ing claims in convincing wrappings, and the disputed subjects 
are endless. Medical and surgical treatments, radiation, pesti¬ 
cides, nuclear power, the probability of environmental disasters, 
the side effects of medicines—almost nothing seems settled. 

Like it or not, we must wade in. Whether we will it or not, 
we have in effect become part of the regulatory apparatus. Dr. 
Peter Montague of Princeton University tells us, ‘The environ¬ 
mental and toxic situation is so complex, we can’t possibly have 
enough officials to monitor it. Reporters help officials decide 
where to focus their activity!* 

“Journalists opened up* the Love Canal toxic waste issue by 
“independent investigation* according to Cornell University’s 
Dr. Dorothy Nelkin. The extensive press coverage contributed 
to investigations that eventually forced the re-staffing of the En¬ 
vironmental Protection Agency and the creation of a national 
toxic waste disposal program.* 1 

That very coverage, however, may also have stampeded 
public officials into hasty, Si-conceived studies that left un¬ 
answered the crucial question: Did the Love Canal wastes ac¬ 
tually cause birth defects and other physical problems? 2 The 
very way we report a medical or environmental controversy can 
affect the outcome. If we ignore a bad situation, the public may 
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FACTS AND FIGURES: WE CAN DO BETTER 5 

suffer. If we write “danger* the public may quake. If we write 
“no danger" the public may be falsely reassured. If we paint an 
experimental medical treatment too brightly, the public is given 
false hope. 

It is not just what we write, it is what we emphasize. A 
National Cancer Institute survey indicated that many persons 
refuse to consider healthy changes in life-style because they 
think “carcinogens are everywhere in the environment.” Such 
persons probably have read or heard again and again that most . | 

cancers are environmentally related, although, in the opinion of J 

most informed scientists, most fatal “environmental” cancers are : 

related mainly to individual behavior, outstandingly smoking, - 

and very possibly diet. By various estimates, perhaps 5 to 15 f 

percent of all cancers are related to exposures to man-made ] 

carcinogens—chemicals we have inserted into the workplace, | 

foods, air, and water. 3 \ 

When it comes to such emotionally charged and complex \ 

issues, or when it simply comes to running for page one or \\ 

making the six o’clock news, the best among us sometimes over- i j 

state or understate. Philip Meyer, veteran reporter and author 
of Precision Journalism, writes, “Journalists who misinterpret 
statistical data usually tend to err in the direction of overin¬ 
terpretation. . . . The reason for this professional bias is self- 
evident; you usually can’t write a snappy lead upholding [the 
negative]. A story purporting to show that apple pie makes you 
sterile is more interesting than one that says there is no evidence 
that apple pie changes your life."* 

We also work fast, sometimes too fast, with severe limits on 
the space or time we may fill. We find it hard to tell editors or 
news directors, “I haven’t had enough time. I don’t have the 
story yet.” Even a long-term project or special may be hurriedly 
done. In a newsroom ‘long-term" may mean a few weeks. A 
major southern newspaper had to print a long, front-page re¬ 
traction after a series of front-page stories alleged that people 
who worked at or lived near a plutonium plant suffered in excess 
numbers from a blood disease. “Our reporters obviously had 
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confused statistics and scientific data,” the editor admitted. “We 
did not ask enough questions."® 

We tend to oversimplify. We may report, “A study showed 
that black is white" or “So-and-so announced that . . . when a 
study merely suggested that there was some evidence that such 
might be the case. We may slight or omit the fact that a scientist 
calls a result “preliminary* As scientific unsophisticates, we may 
confuse a study that merely suggests a hypothesis that should be 
investigated—very frequently the case—with a study that 
presents strong and conclusive evidence. 

We often omit essential perspective, context, or back¬ 
ground. Dr. Thomas Vogt of the Kaiser Permanente Center for 
Health Research tells of seeing the headline “Heart Attacks 
From Lack of *C" and then, two months later, “People Who 
Take Vitamin C Increase Their Chances of a Heart Attack." 6 
Both stones were based on limited, and far from conclusive, 
animal studies. 

Scientists who do poor studies or overstate their results 
deserve part of the blame. But bad science is no excuse for bad 
journalism. We tend to rely most on “authorities* who are either 
most quotable or quickly available or both, and they often tend 
to be those who get most carried away with their sketchy and 
unconfirmed but “exciting* data—or have big axes to grind, 
however lofty their motives. The cautious, unbiased scientist 
who says, “Our results are inconclusive* or “We don*t have 
enough data yet to make any strong statement" or “I don’t know* 
tends to be omitted or buried someplace down in the story. 

We are influenced too by intense and growing competition 
to tell the story first and tell it most dramatically I was once 
asked by a Harvard researcher, “Does competition affect the way 
you present a story?" I thought and had to answer, “We have to 
almost overstate. We have to come as close as we can within the 
boundaries of truth to a dramatic, compelling statement. A 
weak statement will go no place." Another reporter said, “The 
fact is, you are going for the strong [lead and story]. And, while 
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THE CERTAINTY 


The Certainty 
of Uncertainty 



Too much of the science reporting in the pres* {blurs] what we’re sure of and 
what we’re not very sure of and what is inconclusive. The notion of tentative¬ 
ness tends to drop out of much reporting. 

—Dr. Harvey Brooks 


The only trouble with a sure thing is the uncertainty. 

—Author unknown 


The first thing to understand about science is that it is 
almost always uncertain. A scientist, seeking to explain or un¬ 
derstand something—be it the behavior of an atom or the effect 
of the toxic chemicals at a Love Canal—usually proposes a 
hypothesis, then seeks to test it by experiment or observation. If 
the evidence is strongly supportive, the hypothesis may then 
become a theory or at some point even a law, like the law of 
gravity. 

A theory may be so solid that it is generally accepted. 
Example: the theory that cigarette smoking causes lung cancer, 
for which almost any reasonable person would say the case has 
been proved, for all practical purposes. The phrase “for all prac¬ 
tical purposes” is important, for scientists, being practical peo¬ 
ple, must often speak at two levels: the strictly scientific level 
and the level of ordinary reason that we require for daily guid¬ 
ance. 

Example: In June 1985, 16 forensic experts examined the 
bones that were supposedly those of the “Angel of Death," Dr. 
Josef Mengele. Dr. Lowell Levine, delegated by the Depart¬ 
ment of Justice, then said, 'The skeleton is that of Josef 
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THE CERTAINTi 


listener phoned in to exclaim, * They say 7 is a damned liar!" 

TTiey* of course may be different theys who arrive at dif¬ 
ferent conclusions about inconclusive evidence in a thousand 
areas: the role of fats and cholesterol in the diet, the effects of 
low-level radioactivity, the cause of the extinction of dinosaurs. 

Why so much uncertainty? Science is always a continuing 
story. Nature is complex, and almost all methods of observation 
and experiment are imperfect. "There are flaws in all studies" 
says Harvard’s Dr. Mandn Zden * There may be weaknesses, 
often unavoidable ones, in the way a study is designed or con¬ 
ducted. Observers are subject to human bias and error. Subjects 
fluctuate. Measurements fluctuate. 

Many studies are thus inconclusive, and virtually no single 
study proves anything, “Fundamentally^* writes Dr. Thomas 
Vogt, “all scientific investigations require confirmation, and un¬ 
til it is forthcoming all results, no matter how sound they may 
seem, are preliminary?* 4 

Medicine, in particular, is full of disagreement and con¬ 
troversy. “No clinical trial is ever perfect," Harvard’s Dr. John 
Bailar observes. Unlike new drugs, medical treatments and tests 
and surgical operations need not even be subjected to experi¬ 
mental studies before being applied. “Most treatments escape 
and will continue to escape rigorous evaluation" Bailar says.* 

The reasons are many: lack of funds to mount enough 
trials; lack of enough patients at any one center to mount a 
meaningful trial; the expense and difficulty of doing multicenter 
trials; the swift evolution and obsolescence of medical tech¬ 
niques; the fact that, with the best of intentions, medical data— 
histories, physical examinations, interpretations of tests, descrip¬ 
tions of symptoms and diseases—are notoriously inexact and 
vary from physician to physician; and the serious ethical obsta¬ 
cles to trying a new procedure when an old one is doing some 
good, or to experimenting on children, pregnant women, or the 
mentally ill. 

While all studies have flaws, some have more flaws than 
others. Study after study has found that many articles in the 
most prestigious medical journals are replete with shaky statis¬ 
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tics and lack of any explanation of such crucial matters as pa¬ 
tients’ complications and the number of patients lost to follow¬ 
up. Papers presented at medical meetings, many of them widely 
reported by the media, are even 1ess reliable. Many papers are 
mere progress reports on incomplete studies. Some state tenta¬ 
tive results that later collapse. Some are given to draw comment 
or criticism or get others interested in a provocative but still 
uncertain finding* 

The upshot, according to Dr. Gary Friedman of the Kaiser 
organization’s Permanente Medical Group: “Much of health 
care is based on tenuous evidence and incomplete knowledge. . 

• . Seemingly authoritative statements and accepted medical 
doctrines, perpetuated through textbook and lectures, often turn 
out to be supported by the most meager of evidence, if any can 
be found." 7 

In general, possible risks tend to be underestimated and 
possible benefits overestimated. For decades surgeons swore 
that only a radical mastectomy was the treatment for breast 
cancer. Only recently were clinical trials mounted to show that 
less drastic treatments seem equally effective. Prefrontal lobot- 
omy, overstrict bed rest, drugs by the carload—medical history 
is rich in treatments that were given for years without question 
or statistically rigorous study, only to be proved wrong and 
discarded. 

Occasionally, unscrupulous investigators falsify their re¬ 
sults. More often, they may wittingly or unwittingly play down 
data that contradict their theories, or they may search out statis¬ 
tical methods that give them the results they want. Before 
ascribing fraud, says Harvard’s Dr. Frederick Mosteller, “keep 
in mind the old saying that most institutions have enough in¬ 
competence to explain almost any results.** 

So some uncertainty almost always prevails. But uncer¬ 
tainty need not stand in the way of good sense. To live—to 
survive on this globe, to maintain our health, to set public 
policy, to govern ourselves—we almost always must act on the 
basis of incomplete or uncertain information. There is a way we 
can do so. 






Somehow the wondrous promise of the earth is that there are things beautiful in 
it, things wondrous and alluring, and by virtue of yout trade, you want to 
understand them. 

— Mitchdl Feigenbaum 
Cornell University physicist *nd mathematician 

The great tragedy of Science—the slaying of a beautiful hypothesis by an ugly 
fact. 

—Thomas Henry Huxley 


To reporters, the world is full of true believers, peddling 
their “truths.’’ The sincerely misguided and the outright fakers 
are often highly convincing, also newsy. How can we tell the 
facts, or the probable facts, from the chaff? 

We can borrow from science. We can try to judge all possi¬ 
ble claims of fact by the same methods and rules of evidence that 
scientists use to derive some reasonable guidance in scores of 
unsettled issues. 

As a start, we can ask these questions: 

How do you know? 

Have the claims been subjected to ary studies or experiments? 

Were the studies acceptable ones, by general agreement? For exam¬ 
ple: Were they without any substantial bias? 

Have results been fairly consistent from study to study? 

Have the findings resulted in a consensus among others in the same 
field? Do at least the majority of informed persons agree? Or should we 
withhold judgment until there is more evidence? 

Always: Are the conclusions backed by believable statistical evidence? 
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And what is the degree of certainty or uncertainty? How sure can you 
be? 


Obviously, much of statistics involves attitude or policy 
rather than numbers. And much, at least much of the statistics 
that reporters can most readily apply, is good sense. 

There are many definitions of statistics as a tool. A few 
useful ones: The science and art of gathering, analyzing, and 
interpreting data; a means of deciding whether an effect is real; 
a way of extracting information from a mass of raw data; a set 
of mathematical processes derived from probability theory. 

Statistics can be manipulated by charlatans, self-dduders, 
and inexpert statisticians. Deciding on the truth of a matter can 
be difficult for the best statisticians, and sometimes no decision is 
possible. Uncertainty will ever rule in some situations and lurk 
in almost all. 

There are rare situations in which no statistics are needed. 
"Edison had it easy? says Dr. Robert Hooke, a statistician and 
author. Tt doesn’t take statistics to see that a light has come on? 1 
It did not take statistics to tell 19th-century physicians that Mor¬ 
ton’s ether anesthesia permitted painless surgery or to tdl 20th- 
century physicians that the first antibiotics cured infections that 
until then had been Highly fatal. 

Overwhelmingly, however, the use of statistics, based on 
probability, is called the soundest method of decision making, 
and the use of large numbers of cases, statistically analyzed, is 
called die only means for determining the unknown cause of 
many events. Birth control pills were tested on several hundred 
women, yet the pills had to be used for several years by millions 
before it became unequivocally dear that some women would 
develop heart attacks or strokes. The pills had to be used for 
some years more before it became dear that the greatest risk 
was to women who smoked and women over 35. 

The best statisticians, let alone practitioners on the firing 
line (for example, physicians), often have trouble deciding when 
a study is adequate or meaningful. Most of us cannot become 
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statisticians, but we can at least learn that there are studies and 
studies, and the unadorned claim “We made a stud/* or “We did 
an experiment" may not mean much. We can learn to ask more 
pointed questions if we understand some basic concepts and 
other facts about scientific studies. 

These are some bedrock statistical concepts: 

• Probability 

• “Power* and numbers 

• Bias and confounders 

• Variability 

Probability 

Scientists cope with uncertainty by measuring probabilities. 
Since all experimental results and all events can be influenced 
by chance and almost nothing is 100 percent certain in science 
and medicine and life, probabilities sensibly describe what has 
happened and should happen in the future under similar condi¬ 
tions. Aristotle said, “The probable is what usually happens," but 
he might have added that the improbable happens more often 
than most of us realize. 

The accepted numerical expression of probability in evalu¬ 
ating scientific and medical studies is the P(or probability) value. 
The P value is one of the most important figures a reporter 
should look for. It is determined by a statistical formula that 
takes into account the numbers of subjects or events being com¬ 
pared in order to answer the question, could a difference or 
result this great or greater have occurred by chance alone? By more 
precise definition, the P value expresses the probability that an 
observed relationship or effect or result could have seemed to 
occur by chance if there had actually been no real effect . A low P value 
means a low probability that this happened, that a medical 
treatment, for example, might have been declared beneficial 
when in truth it was not. 

Here is why the P value is used to evaluate results. A 
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limits or range). This is what happens when a political pollster 
reports that candidate X would now get 50 percent of the vote 
and thereby lead candidate Y by 3 percentage points, “with a 3* 
percentage-point margin of error plus or minus and a 95 per¬ 
cent confidence level” In other words, Mr. or Ms. Pollster is 95 
percent confident that X’s share of the vote would be someplace 
between 53 and 47 percent. Similarly, candidate Vs share might 
be 3 percentage points greater (or less) than the figure predicted. 
In a dose election, that margin of error could obviously turn a 
predicted defeat into victory. And that sometimes happens. 

An important point in looking at the results of poliucal polls 
(and any other statements of confidence): In the reports we 
read, the plus or minus 3 (or whatever) percentage points is 
often omitted, and the pollster merely mentions a *3-point 
margin of error” This means there is actua&y a 6-point range 
within which the truth probably lurks. 

The more people who are questioned in a poliucal poll or 
the larger the number of subjects in a medical study, the greater 
the chance of a high confidence level and a narrow, and there¬ 
fore more reassuring, confidence interval. 

No matter how reassuring they sound, P values and confi¬ 
dence statements cannot be taken as gospel, for .05 is not a 
guarantee, just a number. There are several important reasons 
for this. 

• AD that lvalues measure is the probability that the results 
might have been produced by some sneaky random process. In 
20 results where only chance is at work, 1, on the average, will 
have a reassuring-sounding but misleading P value of < .05. 
One, in short, may be a false positive. 

Dr. Marvin Zden points out that there may be 6,000 to 
10,000 clinical (medical) trials of cancer treatment under way 
today, and if the conventional value of .05 is adopted as the 
upper permissible limit for false positives, then every 100 studies 
with no actual benefit may, on average, produce 5 false-positive 
results. Hence, we may expea 50 false positive results, on 
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red blood count (say, 0.1 g/100 mL, or a tenth of a gram per 
100 milliliters), may be statistically significant yet medically 
meaningless. 4 

• Eager scientists can consciously or unconsciously manip¬ 
ulate the P value by failing to adjust for other factors, by choos¬ 
ing to compare different end points in a study (say, condition on 
leaving the hospital rather than length of survival), or by choos¬ 
ing the way the P value is calculated or reported. 

There are several mathematical paths to a P value, such as 
the chi-square (x 1 ), t, F, r, and paired t tests. All may be legiti¬ 
mate. But be warned. Dr. David Salsburg of Pfizer, Inc., has 
written in the American Statistician of the unscrupulous practi¬ 
tioner who “engages in a ritual known as *hunting for P values’ * 
and finds ways to modify the original data to “produce a rich 
collection of small P values' even if those that result from simply 
comparing two treatments “never reach the magical .05.’* 

“If you look hard enough through your data,” contributes 
an investigator at a major medical center, “if you do enough 
subset analyses, if you go through 20 subsets, you can find 
one”—say, “the effect of chemotherapy on premenopausal 
women with two to five lymph nodes”—“with a P value less than 
.05. And people do this * 

“Statistical tests provide a basis for probability statements," 
writes Dr. John Bailar, “only when the hypothesis is fully devel¬ 
oped before the data are examined. ... If even the briefest 
glance at a study’s results moves the investigator to consider a 
hypothesis not formulated before the study was started, that 
glance destroys the probability value of the evidence at hand.” 
(At the same time, Bailar adds, “review of data for unexpected 
dues . . . can be an immensely fruitful source of ideas” for new 
hypotheses “that can be tested in the correct way” And occa¬ 
sionally “findings may be so striking that independent confirma¬ 
tion ... is superfluous.”)* 

A rather sophisticated—and possibly touchy—line of ques¬ 
tioning that some reporters might want to try if they’re skeptical: 
How did you arrive at your P value? Did you use the test planned in 
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children of this age group, we would expect only 3 cases in 100 
years. But in this nation with thousands of schools, we would 
occasionally*—such is chance—find schools with 3 or more cases 
in a single year. Then one is faced with the problem of interpre¬ 
tation,* Zelen says. “Is this one of those rare events that is surely 
going to be observed? Or is it due to some causal factor?* 

A reporter in this instance might ask a statistician at the 
National Cancer Institute or a medical center. What is the 
chance of such an event in such a population? How many 
similar unusual events are probably never reported? 


“Power* and Numbers 


This gets us to another statistical concept: power. Statisti¬ 
cally, “power* means the probability of finding something if it’s 
there. Example: Given that there is a true effect, say a difference 
between two medical treatments or an increase in cancer caused 
by a toxin in a group of workers, how likely are we to find it? 

Sample size confers power. Statisticians say, “Funny things 
can happen in small samples without meaning very much* . . . 
There is no probability until the sample size is there" . . . 
“Large numbers confer power* . . . “Large numbers at least 
make us sit up and take notice."* 

All this concern about sample size can also be expressed as 
the law of large numbers , which says that as the number of cases 
increases, the probable truth of a conclusion or forecast in¬ 
creases. The validity (truth or accuracy) and reliability (reproduci¬ 
bility) of the statistics begin to converge on the truth. 

We already learned this when we talked about probability. 


"There is another unrelated use of the word "power" Saemisu commonly speak of 
increasing or "raising" some quantity by m power tf 2 or 3 or 100 or whatever. "Power" 
here means the product you get when you multiply a number by itself one or more 
times. Thus, in2x2«4, 4isthe second power of 2, or to put it another way, there 
are two 7s in your equation. This is commonly written 2* and known as 2 to the second 
power or just 2 to the second. In 2 X 2 X 2 * 8, 2 has been raised to the third power. 
When you think about 2***, you see the need for the shorthand. 
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page 33] for a number like that is 100—that is, the square root 
of the original number. That means the number may vary by a 
minimum of 200 every year without even considering growth, 
the business cycle, or any other effect. This will supplement 
your reporters approach" 

Looking for error in reported results, statisticians try to spot 
both false positives and false negatives. The false positive (or Type 
1 or alpha error in statistical language you may see) is to find a 
result or effect where there is none. The false negative (or Type II 
or beta error) is to miss an effect where there is one. The latter is 
particularly common when there are small numbers. There are 
some very well conducted studies with small numbers, even five 
patients, in which the results are so dear-cut that you don’t have 
to worry about power" says Dr. Reiman. "You still have to 
worry about applicability to a larger population, but you don’t 
have to doubt that there was an effect. When results are nega¬ 
tive, however, you have to ask, How large would the effect have 
to be to be discovered?” 

Many scientific and medical studies are underpowered— 
that is, they indude too few cases. "Whenever you see a negative 
result" another sdentist says, "you should ask, What is the 
power? What was the chance of finding the result if there was 
one?" One study found that an astonishing 70 percent of 71 
well-regarded dinical trials that reported no effect had too few 
patients to show a 25 percent difference in outcome. Half of the 
trials could not have detected a 50 percent difference* 

A statistician scanned an artide on colon cancer in a lead¬ 
ing journal. "If you read the artide carefully,” he said, "you will 
see that if one treatment was better than the other—if it would 
increase median survival by 50 percent, from five to seven and a 
half years, say—they had only a 60 percent chance of finding it 
out. That’s little better than tossing a coin!” 

The weak power of that study would be expressed numeri¬ 
cally as .6, or GO percent. Scan an article’s fine print or foot¬ 
notes, and you will sometimes find such a power statement. Most 
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What are your numbers? After all, some researchers reportedly 
announced a new treatment for a disease of chickens by saying, 
*33.3 percent were cured, 33.3 percent died, and the other one 
got away? 


Bias and Confounders 


One scientist once said that lefties are overrepr ese nted 
among baseball’s heavy hitters. He saw this as “a possible result 
of their hemispheric lateralization, the relative roles of the two 
sides of the brain? A critic who had seen more ball games said 
some simpler covariables could explain the difference. When 
they swing, left-handed hitters are already on the move toward 
first base. And most pitchers are right-handers who throw most 
often to right-handed hitters. 11 

Scientist A was apparently guilty of bias, meaning the intro¬ 
duction of spurious associations and error by failing to consider 
other influential factors. The other factors may be called covaria¬ 
bles, covariates, intervening or contributing variables , confounding varia¬ 
bles, or confounders. A simpler term may be “other explanations? 

Statisticians call bias “the most serious and pervasive prob¬ 
lem in the interpretation of data from clinical trials" . . . “the 
central issue of epidemiological research" . . . “the most com¬ 
mon cause of unreliable data? Able and conscientious scientists 
try to eliminate biases or account for them in some way. But not 
everybody who makes a scientific, medical, or environmental 
claim is that skilled. Or that honest. Or that all-powerful. Some 
biases are unavoidable by the very difficulty of much research, 
and the most insidious biases of all, says one statistician, are 
“those we don’t know exist? 

Some biases may be uncovered by assiduous investigation. 
A father noticed that every time one of his 11 kids dropped a 
piece of bread on the floor, it landed with the buttered side up. 
“This utterly defies the laws of chance" he exclaimed. Close 
examination disclosed the cause: The kids were buttering their 
bread on both sides. 
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I told this story to one statistician, who said, “I was once 
called about a person who had won first, second, and third 
prizes in a church lottery. I was asked to assess the probability 
that this could have happened. I found out that the winner had 
bought nearly all the tickets* 

He had of course asked the obvious question for both scien¬ 
tist and reporters: Could the relationship described be explained by other 
factors? 

Not everyone will tell you, of course, for bias is a pervasive 
human failing. As one candid scientist is said to have admitted, 
“I wouldn’t have seen it if I hadn’t believed it.” Enthusiastic 
investigators often tell us their findings are exciting. But they 
may be so exciting that the investigators paint the results in 
over-rosy hues. 

Other powerful human drives—the race for academic pro¬ 
motion and prestige, financial connections—can also create con¬ 
scious or unconscious conflicts of interest or attitudes that feed 
bias. Dr. Thomas Chalmers of Mount Sinai Medical Center in 
New \brk tells of a drug trial, financed by a pharmaceutical 
firm, in which both the head of the study committee and the 
main statisticians and analysts were the firm’s employees, 
though not so identified in any credits. He tells of a study of oral 
drugs for diabetes in which the fact that the first author had 
previously published 14 articles on the subject, and in 7 had 
acknowledged support by the drug manufacturers, was “not 
known to the reader” 

In contrast, Chalmers describes a study also financed by a 
drug firm but with a contract specifying a study protocol de¬ 
signed by independent investigators and monitored by an out¬ 
side board less likely to be influenced by a desire for a favorable 
outcome. “It is never possible to eliminate” potential conflicts of 
interest in biomedical research, he concludes, but they should be 
disclosed so others can evaluate them. 13 

Even a genius may be biased. Horace Freeland Judson of 
Johns Hopkins University tells how Isaac Newton experimented 
with prisms and lenses and developed a theory of color, light, 
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and the solar spectrum. He did not report seeing some dark 
lines—absorption lines, which mark varying wavelengths—that 
his instruments must have shown. A modem scientist argues 
that Newton’s theory, not his instruments, had no place for that 
evidence: *To the observing scientist, hypothesis is both friend 
and enemy? 14 

For years technicians making blood counts were guided by 
textbooks that told them two or more “properly* studied samples 
from the same blood should not vary beyond narrow “allowable” 
limits. Reported counts always stayed inside those limits. A 
Mayo Clinic statistician rechecked and found that at least two 
thirds of the time the discrepancies exceeded the supposed 
limits. The technicians had been seeing what they had been told 
to expect and discounting any differences as mistakes. This also 
saved them from the additional labor of doing still more count¬ 
ing. 

Both the (nosed observer and the biased subject are common in 
medicine. A researcher who wants to see a treatment result may 
see one. A patient may report one out of eagerness to please the 
researcher. There is also the powerful placebo effect . Summarizing 
many studies, one scientist found that half the patients with 
headaches or seasickness—and a third of those suffering from 
coughs, mood changes, anxiety, the common cold, and even the 
disabling chest pains of angina pectoris—reported relief from a 
“nothing pill.” 1 ® A placebo is not truly a nothing pill; the mere 
expectation of relief seems to trigger important effects within the 
body. But in a careful study the placebo should not do as well as 
a test medication; otherwise the test medication is no better than 
a placebo. 

Sampling bias is the bugaboo of both political polls and medi¬ 
cal studies. Say you want to know what proportion of the popu¬ 
lace has heart disease, so you stand on a comer and ask people 
as they pass. Ybur sample is biased, if only because it leaves out 
those too disabled to get around. Your problem, a statistician 
would say, is selection . A political pollster who fails to build a valid 
probability sample, easy when questioning only a thousand or 
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logical or environmental study), there were probably many 
dropouts. A well-conducted study should describe and account 
for them. A study that does not may report a favorable treat¬ 
ment result by ignoring the fate of the dropouts—a confounding 
variable. 

Age, gender, occupation, nationality, race, income, so¬ 
cioeconomic status, health status, and powerful behaviors like 
smoking are all possible confounding—and frequently ig¬ 
nored—variables. In the 1970s, foes of adding fluoride to city 
water pointed to crude cancer mortality rates in two groups of 
10 U.S. cities. One group had added fluoride to water, the other 
had not, and from 1950 to 1970 the cancer mortality rate rose 
faster in the fluoridated cities. The National Cancer Institute 
pointed out that the two groups were not equal: The difference 
in cancer deaths was almost entirely explained by differences in 
age, race, and sex. The age-, race-, and sex-adjusted difference 
actually showed a small, unexplained lower mortality rate in the 
fluoridated cities. 17 

If you look carefully at the fate of women taking birth 
control pills, you find that advancing age and smoking are the 
two great confounders. You must take both into account to find 
the greatest clusters of ill effects. Smoking has been an important 
confounder in studies of industrial contaminants like asbestos, 
in which, again, the smokers suffer a disproportionate number 
of ill effects. 1 * 

A 1947 survey of Chicago lawyers showed that those who 
had mere high school diplomas before entering legal training 
earned 6.3 percent more, on the average, than college gradu¬ 
ates. The confounder here—the real explanation—was age. In 
1947 there were still many older lawyers without college de¬ 
grees, and they were simply older, on the average, and hence 
more established. 19 

Occupational studies often confront another seeming para¬ 
dox: The workers exposed to some possible adverse effect turn 
out to be healthier than a control group of persons without such 
exposure. The confounder: the well-known heedihy-worker effect. 
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Workers tend to be healthier and live longer than the population 
in general. 

Some studies of workers in steel mills showed no overall 
increase in cancer, despite possible exposures to various carcino¬ 
gens. It took a look at black workers alone to find excess cancer. 
They commonly worked at the coke ovens, where carcinogens 
were emitted. This was a case where the population had to be 
stratified, or broken up in some meaningful way, to find the facts. 
Such findings in blacks often may be falsely ascribed to race or 
genetics, when the real or at least the most important contribut¬ 
ing or ruling variables—to a statistician, the independent varia¬ 
bles—ut. occupation and the social and economic plights that 
put blacks in vulnerable settings. The excess cancer is the depend¬ 
ent variable, the result. 

Tn a two-variable relationship,* Dr. Gary Friedman ex¬ 
plains, “one is usually considered the independent variable, 
which affects the other or dependent variable." 20 Take the fact 
that more people get colds in winter. Here weather is commonly 
seen as the underlying or independent variable, which affects 
incidence of the common cold, the dependent variable. Actually, 
of course, some people, like children in school who are con¬ 
stantly exposed to new viruses, are more vulnerable to colds 
than others. In the case of these children, then, as in the case of 
the black workers at the coke ovens, there is often more than 
one independent variable. Also, some people think that an im¬ 
portant underlying reason for the prevalence of colds in winter 
may be that children are congregated in school, giving colds to 
each other, thence to their families, thence to their families’ 
coworkers, thence to the coworkers’ families, and so on. But 
cold weather—and home heating?—may still figure, perhaps by 
drying nasal passages and making them more vulnerable to 


The search for true variables is obviously one of the main 
pursuits of the epidemiologist, or disease detective—or of any 
physician who wants to know what has affected a patient, or of 
any student of society who seeks true causes. Like colds, many 
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medical conditions, such as heart disease, cancer, and probably 
mental illness, have multiple contributing factors. Where many 
known, measurable factors are involved, statisticians can use 
mathematical techniques—the terms you will see include multiple 
regression, multivariate analysis, and discriminant analysis and factor, 
cluster, path, and two-stage least-squares analyses—xo relate all the 
variables and try to find which are the truly important predic¬ 
tors. Yet some situations, like the striking decline in U-S. heart 
disease mortality in recent years, defy such analyses. These 
years have seen several major changes in American life that 
may play a role: less smoking among men, consumption of a 
leaner diet, more recreational exercise (though more sedentary 
work). Medical care is far better, including the treatment of 
hypertension, which disposes people to heart disease. Many of 
these variables cannot be well measured, and the effect of some 
is debatable, so—a common situation in science—the truth re¬ 
mains uncertain. 

Variability 

Doctors always say, “Most things are better in the morning,” 
and the/re mostly right. Most chronic or recurring conditions 
wax and wane. We tend to wake up at night when the condition 
is at its worst. Then, no matter what is done by way of treat¬ 
ment the next day, die odds are that well feel better. 

This is regression toward the mean: the tendency of all values in 
every field of science—physical, biological, social, and eco¬ 
nomic—to move toward the average. Tall fathers tend to have 
shorter sons, and short fathers, taller sons. The students who get 
the highest grades on an exam tend to get somewhat lower ones 
the next time. The regression effect is common to all repeated 
measurements. 

Regression is part of an even more basic phenomenon: 
variation, or variability. Virtually everything that is measured var¬ 
ies from measurement to measurement. When repeated, every 
experiment has at least slightly different results. Take a patients 
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blood pressure, pulse rate, or blood count several times in a 
row, and the readings will be somewhat different, lake them at 
different times of day or on different days; and the readings may 
vary greatly. 

The important reasons? In part, fluctuating physiology, but 
also measurement errors, die limits of measurement accuracy, 
and observer variation. Examining the same patient, no two 
doctors will report exactly the same results, and the results may 
be grossly different. If six doctors examine a patient with a faint 
heart murmer, only one or two may have the skill or keen 
hearing to detect it. Experimental results so typically differ from 
one time to the next that scientific and medical fakers—a Boston 
cancer researcher, for example—have been detected by the un¬ 
usual regularity of their reported results, with numbers agreeing 
too well and the same results appearing time after time, with not 
enough variation from patient to patient. 

Biological variation is the most important cause of variation in 
physiology and medicine. Different patients, and the same pa¬ 
tients, react differently to the same treatment. Disease rates 
differ in different parts of the country and among different popu¬ 
lations, and—alas, nothing is simple—there is natural variation 
within the same population. 

Every population, after all, is a collection of individuals, 
each with many characteristics. Each characteristic, or variable, 
such as height, has a distribution of values from person to person, 
and—if we would know something about the whole popula¬ 
tion—we must have some handy summaries of the distribution. 
We can’t get much out of a list of 10,000 measurements, so we 
need single values that summarize many measurements. 

Enter here the familiar average or, more exactly, the mean, 
median, and mode. These and a few other measures can give us 
some idea of the look of the whole and its many measurable 
properties, or parameters ; 

When most of us speak of an average, we mean simply the 
mean or arithmetic average, the sum of all the values divided by the 
number of values. The mean is no mean tool; it is a good way 
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to get a typical number, but it has limitations, especially when 
there are some extreme values. There is said to be a memorial 
in a Siberian town to a fictitious Count Smerdlovski, the world's 
champion at Russian roulette. On the average he won, but his 
actual record was 73 and l. 11 

If you look at the average salary in a hospital, you will not 
know that half the personnel may be working for the minimum 
wage, while a few hundred persons make $100,000 or more a 
year. You may learn more here from the median, the figure that 
divides a population into two equal halves. The median can be 
of value when a group has a few members with extreme values, 
like the 400-pounder at an obesity dinic whose other patients 
weigh from 180 to 200 pounds. If he leaves, the patients’ mean 
weight might drop by 10 pounds, but the median might drop 
just 1 pound.* 2 

The most frequently occurring number or value in a distri¬ 
bution is called the mode. When the median and the mode are 
about the same, or even more when mean, median, and mode 
are roughly equal, you can feel comfortable about knowing the 
typical value. 

You still need to know something about the exceptions, in 
short, the dispersion (or spread or scatter) of the entire distribu¬ 
tion. One measure of spread is the range. It tells you the lowest 
and highest values. It might inform you, for example, that the 
salaries in that hospital range from $10,000 to $250,000. 

You can also divide your values into 100 percentiles , so you 
can say someone or something falls into the 10th or 7lst per¬ 
centile, or into quartiles (fourths) or quintiles (fifths). One useful 
measure is the interquartile range , the interval between the 75th 
and 25th percentiles—this is the distribution in the middle, 
which avoids the extreme values at each end. Or you can divide 
a distribution into subgroups—those with incomes from $10,000 
to $20,000, for example, or ages 20 to 29, 30 to 39, and so on. 

All these values can easily be plotted. With many of the 
dungs that scientists, economists, or others measure—IQs, for 
example, and other test scores—we typically tend to see a famil¬ 
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Example: If the average score of all students who take the 
SAT college entrance test is relatively low and the spread—the 
standard deviation—relatively large, this creates a very long- 
tailed, low-humped curve of test scores, ranging, say, from 
around 300 to 1500. But if the average score of a group of 
brighter students entering an elite college is high, the standard 
deviation of the scores will be less and the curve will be high¬ 
humped and short-tailed, going from maybe 900 to 1500. 

Tf I just told you the means of two such distributions, you 
might say they were the same,* another scientist says. “But if I 
reported the means and the standard deviations, you’d know 
they were different, with a lot more variations in one." 

From a human standpoint, variation tells us that it takes 
more than averages to describe individuals. Biologist Stephen 
Jay Gould learned in 1982 that he had a serious form of cancer. 
The literature told him the median survival was only eight 
months after discovery. Three years later he wrote in Discover, 
“All evolutionary biologists know that means and medians are 
the abstractions,* while variation is “the reality? meaning “half 
the people will live longer* than eight months. 

Since he was young, since his disease had been diagnosed 
early, and since he would receive the best possible treaunent, he 
decided he had a good chance of being at the far end of the 
curve. He calculated that the curve must be skewed well to the 
right, as the left half of the distribution had to be “scrunched up 
between zero and eight months, but the upper right half [could] 
extend out for years." He concluded, “I saw no reason why I 
shouldn’t be in that small tail. ... I would have time to think, 
to plan and to fight.* Also, since he was being placed on an 
experimental new treatment, he might if fortune smiled “be in 
the first cohort of a new distribution with ... a right tail ex¬ 
tending to death by natural causes at advanced old age." 13 

Statistics cannot tell us whether fortune will smile, only that 
mch reasoning is sound. 
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brought forth for on-camera testimonials. Except for some 
newspapers that decided to print nothing, the story flew far and 
wide. 

The head investigator, a chief resident in neurosurgery, 
cautioned that the results, though encouraging, were Very 
tally* and “certainly do not prove this is an effective treatment* 
He advised healthy skepticism. But headlines unequivocally 
read: “Alzheimer's Test Found Successful,* “Alzheimer's: A New 
Promise* “First Breakthrough Against Alzheimer's," “Pump Of¬ 
fers Hope,* “Possible Alzheimer's Cure." 

Within two months the medical center logged 2,600 phone 
cadis, mainly from desperate families, and critics began asking 
why a press conference had been held, since a study of only four 
patients—with unblinded investigators getting their assessments 
from hopeful families—meant little. 

Harvard’s Dr. Jay Winsten concluded that “the decision to 
hold a press conference ... for outweighed in impact the mod¬ 
ulating effect of the investigators’ qualifying language. The vis¬ 
ual impact of [one] patient’s on-camera testimonials all but 
guaranteed that TV coverage would oversell the research, de¬ 
spite any qualifying language." 1 

When dubious claims are made—about Alzheimer's, a new 
cancer drug, a possible AIDS cure—and the claims get widely 
reported, there is commonly a lot of postmortem clucking and 
soul-searching among reporters and editors. Then someone else 
makes some sensational claim, and the same thing may happen 
all over again. 

The biggest error in medical science, according to Dr. 
Thomas Chalmers, is “the uncontrolled pilot study in which the 
investigators try a treatment on 10 patients, and if it seems to 
work . . . are tempted to report it* to fellow scientists, let alone 
the media. 2 

Afl science is only a stab at the truth. Even with the best of 
statistics, “We scientists don’t know how to tell the whole truth" 
Mosteller reminds us. 2 Outside this honest limitation lie vast 
realms of inadequate science with plausible-sounding yet shaky 
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statistics. A French physician, Pierre Charles Alexandre Louis, 
said 150 years ago. The only reproach which can be made to 
the numerical method* is that it “requires much more labor and 
time than the most distinguished members of our profession* 
often give it. “Some days* says one modem statistician, “I think 
every idiot in the country who can put his hands on a computer 
program thinks he’s a statistician.* 

The big problems of statistics, say its best practitioners, 
have little to do with computations and formulas. They have to 
do with judgment, we’re told, with how to design a study, how 
to conduct it, then analyse and interpret the results. In a day of 
frenzied media competition for the public’s eye and ear—and 
many chances to do harm by shaky reporting—journalism too 
calls for sophisticated judgment. How, then, can we have some 
hope of telling which studies seem credible, which we should 
report? 

A fundamental principle is that every conscientiously con¬ 
ducted study has a careful design: a method or plan of attack to 
include the right kind and number of patients or petri dishes 
and to try to eliminate bias. Different problems require different 
methods, and one of the most basic questions in science is. Can 
this kind of experiment, this design , yield the answer? 

This is not a simple question for a reporter to answer, but 
there is much we can know. What kinds of studies, what kinds 
of numbers and controls and methods, should we look for? 

Experiments versus Seductive Anecdotes 

Students and eggs can be graded, citizens and cities can be 
credit-rated, and scientific evidence can be weighed according to 
what has been called a hierarchy of evidence. Some kinds of 
studies carry little weight, some more, some a great deal. 

Science and medicine started with anecdotes , unreliable as far 
as generalization is concerned, yet provocative. Anecdotes ma¬ 
tured into systematic observation, the most ancient form of 
science. Observation told the ancients much about the stars, it 
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told the pharaohs’ physicians much about the sick, and it is still 
important, for simple “eyeballing* has developed into data collec¬ 
tion and the recording of case histories. These are respectable, yea, 
indispensable methods yet still only one part of science. Case 
histories may not be typical, or they may reflect the beholder. 
Methane continues to be plagued by Big Authorities who insist, 
T know what I see" 

There can be useful, even inspired, observation and analy¬ 
sis of natural experiments. Excess fluoride in some waters hardened 
teeth, and this observation led to fluoridation of drinking water 
to prevent tooth decay. There are also man’s inadvertent experi¬ 
ments, disastrous and benign, to be studied. Hiroshima trig¬ 
gered wide analysis of the effects of nuclear radiation, invaluable 
yet frustrating because there were no good measures of exposure 
levels, a gap that has caused confusion and controversy ever 
since. 

In 1585 or so, Galileo dropped those weights from a tower 
and helped invent the scientific experiment: a study in which the 
experimenter controls the conditions—controlled conditions are 
the heart of the experimental method—and records the effect. 
Experiments on objects, animals, germs, and people matured 
into the modem experimental study ; in which the experimenter 
typically changes only one or some other planned number of 
variables to see the outcome. 

Clinical Trials 

The experimental method is the essence of experimental 
medicine’s current “gold standard": 4 the controlled\ randomized clini¬ 
cal trial. At its best, the investigator tests a treatment or drug or 
some other intervention by randomly selecting at least two com¬ 
parable groups, the experimental group that is tested or treated and 
a control group that is observed for comparison. 

True clinical trials are expensive and difficult. It has been 
estimated that of 100 scheduled trials, 60 are abandoned, not 
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controls are often misleading—the groups compared are fre¬ 
quently not comparable, the treatments may have been given 
by different methods—but they are still at times useful. 

What Makes a Study Honest? 

Obviously, all studies, including the best, have potential 
pitfalls: 

• Lack of adequate controls is fatal if you really want to put the 
results in the bank. 

• The group or sample studied\ 10 people or 10,000, must be 
large enough to get a valid result and representative enough to 
apply to a larger population. Because people vary so widely in 
their reactions, and a few patients can fool you, fair-sized groups 
of patients are usually needed. And enough of the right kind of 
subjects are needed for a suitable sample. Picking patients for a 
medical study is no different from picking citizens to be ques¬ 
tioned in a political poll. In both, a sample is studied, and 
inferences—the outcome of an election, the results in patients in 
general—are made for a larger population. 

To get a large enough sample, medical researchers more 
and more try to conduct multicenter trials, which are appealing 
because they can indude hundreds of patients, but expensive 
and tricky because one must try to maintain similar patient 
sdection and quality control at 10 or 100 institutions. Successful 
multicenter trials established the value of controlling hyperten¬ 
sion to prevent strokes. They demonstrated the strong probabil¬ 
ity that less extensive surgery is as effective as more drastic 
surgery for many breast cancers. 

• The sample should be randomized — divided by some random 
method into comparable experimental and control groups. Ran¬ 
domization can easily be violated. A doctor assigning patients to 
treatment A or B may, seeing a particular type of patient, say or 
think, This patient will be better on B* 

If treatment B has been established as better than A, there 
should be no random study in the first place and certainly no 
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study of that doctor's patient. When randomization is violated, 
“the triaTs guarantee of lack of bias goes down the drain,” says 
one critique. As a result, patients who consent to randomization 
are often assigned to study groups according to a list of com* 
puter-generated random numbers. 

• To combat War—the influence of confounding variables— 
and get answers applicable to various populations, the sample or 
study population must often be stratified, or separated into 
groups by age, sex, socioeconomic status, and so on. Failure to 
stratify can hide true associations. The role of high-absorbency 
tampons in toxic shod: syndrome was clarified only when the 
cases were broken down by precise type of tampon used. 

The identification of important subcategories of patients 
can be tricky indeed. A study of open-heart surgery patients 
may fail to separate out those who had to wait for their surgery. 
But some patients die waiting, and those left are relatively 
stronger patients who do better, on the average, than those 
treated immediately after diagnosis. 

Wit reporters may also fail to pay attention to stratification, 
or distribution. In early 1985 the President’s Council of Eco¬ 
nomic Advisers reported that—to quote the page-one lead in a 
major newspaper—‘‘elderly Americans have achieved economic 
parity with the rest of the population and no longer ate a disad¬ 
vantaged group.” Not for several paragraphs, now on an inside 
page, did the story note that “there’s a lot of variability? and 
older people are also “more likely ... to have members with 
incomes below die average of their age group.”* In short, there 
are still many elderly trapped in poverty. 

• To combat bias in investigators or patients, studies should be 
blinded —to the extent feasible, singledouble -, or, best of all, triple- 
blinded, so that neither the doctors nor the nurses administering 
a treatment nor the patients nor those who assess the results 
know whether today’s pill is treatment A, treatment B, or an 
ineffective placebo. Otherwise, a doctor or patient who yearns for 
a good result may see or fed one when the ‘right” drug is given. 
There is a tale of an overzealous receptionist who, knowing 
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which patients were getting the meal drug and not the placebo, 
was so encouraging to these patients that they began saying they 
felt good, willy-nilly* 

Barring observant receptionists, the use of a placebo—from 
the Latin meaning "I shall please*—may help maintain blind¬ 
ness. Placebos actually give some relief in a third of all patients, 
on the average, in various conditions. The effect is usually tem¬ 
porary, however, and a truly effective drug ought to work sub¬ 
stantially better than the placebo. 

Blinding is often impossible or unwise. Some treatments 
don’t lend themselves to it, and some drugs quickly reveal them¬ 
selves by various effects. But an unblinded test is a weaker test. 

• Finally, what makes a study honest is honesty. John Bailar 
warns of deliberate or careless deceptions that seem to be uni¬ 
versally accepted today, practices that sometimes have much 
value but at other times are “inappropriate and improper and, 
to the extent that they are deceptive, unethical.” Among them: 
the selective reporting of findings, leaving out some that might 
not fit the conclusion; the reporting of a single study in multiple 
fragments, when the whole might not sound so good; and the 
failure to report the low power of some studies, their inability to 
detect a result even if one existed. 7 

Dr Charles Moertel of the Mayo Clinic says, 


Probably the majority of cancer patients treated with chemotherapy 
today are receiving regimens that have not been proved effective by 
randomized trial. . . . Many articles published in our major journals 
make claims for fantastic therapeutic accomplishments with no ran¬ 
domized controls. . . . Many, if not most, of the randomized studies . 
. . are of such poor quality that their results are unbelievable. . . . 
Precious few have withstood the scrutiny of carefully designed 
confirmatory scientific study. 


He calls a multitude of poor methods statistical legerde¬ 
main: “the games we play, trying to squeeze out that little bit of 
breakthrough." Why the pressure to play them? “Salvation," Dr. 
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David SaJsburg answers. “Fruit in this world (increases in salary; 
prestige, invitations to speak) and beyond this life (continual 
references in the citation index)."® 

Epidemiology: Hippocrates to AIDS 

Clinical studies deal with patients. Epidemiology deals with 
populations, which sometimes are large groups of patients. Epi¬ 
demiology seeks the causes of both health and disease by placing 
a population under its own kind of microscope, the epidemiologi¬ 
cal investigation. 

Epidemiological studies in many ways parallel clinical stud¬ 
ies—some studies are both—and are subject to many of the 
same pitfalls and rules, like avoiding bias and stratifying to get 
the right answers about the right subgroups. An old saw, in fact, 
goes, an epidemiologist is a physician broken down by age and 
sex. 

Epidemiology in its early days was concerned wholly with 
epidemics of typhoid, smallpox, and other infections. But epide¬ 
miologists today also ask, “What should we eat and how should 
we live to stay healthy?" and they study large groups to see how 
the healthiest and unhealthiest live. Hippocrates has been called 
the first environmentalist because he observed that it was 
healthier to live in high places than in low ones. Anticipating 
today’s environmentalists, he blamed bad air and bad water and 
may have been partly right. But he failed to stratify; otherwise 
he might have noticed that the people who lived high were also 
wealthier and better nourished than those who lived low.® 

In 1740 Percival Pott scored a famous epidemiological 
success by observing the high rate of scrotum cancer in Lon¬ 
don’s chimney sweeps and correctly blaming it on their exposure 
to soot—burned organic material, much like a smoked ciga¬ 
rette. A century later, John Snow, plotting London cholera 
cases on a map and noting a duster around one source of 
drinking water, removed the handle from the now famed Broad 
Street pump and helped end a deadly epidemic. The 19th- 
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century French advocate of statistical methods, Pierre Louis, 
observed hospital patients and helped stop the use of bleeding as 
a treatment. Ignaz Semmelweis showed that doctors’ dirty 
hands transmitted deadly childbed fever to mothers. 

Modem epidemiologists successfully indicted smoking as a 
cause of lung cancer and heart disease and identified the associa¬ 
tion of fats and cholesterol with dogging of the arteries. They 
evaluate vaccines, assess new methods of health care delivery, 
and track down the causes of new scourges like AIDS, toxic 
shock syndrome, and Legionnaires’ disease, all by several 
methods. All are valuable. All are full of traps. 

• Epidemiology, like all of sdence, started with observational 
studies, and these remain important. They are weak and uncer¬ 
tain, we have noted, when it comes to determining cause and 
effect. Yet observation is how we first learned of the unfortunate 
effects of toxic rain, Agent Orange, dgarette smoking, and 
many sometimes helpful, sometimes harmful medications—and 
of certain sexual practices and addicts* use of dirty needles on 
AIDS. 

• Some observational studies are simply descriptive —describ¬ 
ing the incidence, prevalence, and mortality rates of various 
diseases, for example. Other, analytic studies seek to analyze or 
explain: the Seven-Country Study, for example, that helped 
associate high meat and dairy fat and cholesterol consumption 
with excess risk of coronary heart disease. Ecological studies look 
for links between environmental conditions and illness. Human 
migrations—like that of the Japanese who come to the United 
States, eat more fat, and get more disease than they did in 
Japan—are among valuable natural experiments. 

• The simplest observational measurement is a count. Sam¬ 
pling is just a more sophisticated kind of count. You can’t count 
or question everybody, so you seek a sample that represents the 
whole. Many epidemiological surveys rely on samples—among 
them, government surveys of health and nutritional habits. 
Samples and surveys often use questionnaires to get information. 

A sample or survey is never more than a snapshot of the 
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smoking to lung cancer, the association of birth control pills with 
Mood vessel problems, and the transmission patterns of AIDS 
were identified in case-control studies that pointed to the need 
for broader investigation. 

Cohort or incidence studies are motion pictures. They pick a 
group of people, or cohort —a cohort was a unit of a Roman 
legion—often stratify or divide them into subgroups, then follow 
them over time, often for years, to see how some disease or 
diseases develop. These studies are costly and difficult. Subjects 
drop out or disappear. Large numbers must be studied to see 
rare events. But cohort studies can be powerful instruments and 
substitutes for randomized experiments that would be ethically 
impossible. You can’t ethically expose a group to an agent that 
you suspect would cause a disease. You can watch a group so 
exposed. 

The noted Framingham study of ways of life that might be 
associated with developing heart disease has followed more than 
5,000 residents of that Massachusetts town since 1948. The 
American Cancer Society's 1952-55 study of 187,783 men aged 
50 to 69, with 11,780 of them dying during that period, did 
much to establish that cigarette smoking was strongly associated 
with developing lung cancer. 10 

• Many epidemiological, as well as clinical, studies are 
handicapped because they must be retrospective. They look back 
in time—at medical records, vital statistics, or people’s recollec¬ 
tions (for example, those collected in interviews in a case-control 
study). People who have a disease are questioned to try to find 
common habits or exposures. Women with cervical cancer are 
interviewed to see how many took possibly guilty hormones and 
how many did not. People who live around a Love Canal are 
asked if they have been HI. 

Retrospective studies are notoriously unreliable. Memories 
foil or play tricks. Old records are poor and misleading. Defini¬ 
tions of diseases and methods of diagnosis vary sharply over the 
years. The patients you find may not be representative. A retro¬ 
spective study, however intriguing, generally only says that 
there may be something here that ought to be investigated. 
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QUESTIONS REPO 


Questions Reporters 
Can Ask 


Just because Dr. Famous or Dr. Bigshot says this is what he (bund doesn’t mean 
it is necessarily so. 

—Dr. Arnold Rrinun 


Ask to see the numbers, not just the pretty colors. 

—Dr. Richard Margolin 
hiotwnat Institute* oj Health, 
dcKTibing PKT team to reporters 


W, 


HAT questions should we reporters ask—to make our 
news solid, to report the more valid claims and ignore the weak 
and phony? When a scientist or physician or anyone else says, 
Tve discovered that . . . " what should we ask? 

In 1949, a year after Britain’s National Health Service— 
“socialized medicine*—was launched, my editors sent me to 
Britain to see how it was working. A bit stumped, I asked Dr. 
Morris Fishbein, the provocative genius who long edited the 
Journal of the American Medical Association, “How can I, a reporter, 
tell whether a doctor is doing a good job?* He immediately said, 
“Ask him how often he has a patient take off his shirt.* 

His lesson was plain: No physical examination is complete 
unless the patient takes off his or her clothes. Most reporters are 
not skilled statisticians, but we can ask some similarly revealing 
questions. Many of these are not even statistical, just simple 
ones that, like Fishbein’s, probe soft spots and often disclose 
either a conscientious approach or one that can’t be trusted. 
We can learn here from one method of science. We said 
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Why did you do it that way? Do you think it was the right kind of 
study to get the answer to this question or problem? 

Was it a true human experiment, if possible\ with comparable groups 
picked at random for comparison? If not , why not? And what was the 
substitute? 

If an investigator patiently—you hope—tells you about an 
acceptable-sounding design, that’s worth a brownie point. If the 
answer is "Huh?” or a nasty one, that may tell you something 
else. 

Art you presenting preliminary data or something fairly conclusive? 
Are you presenting a conclusion or a hypothesis for further study? “Pre¬ 
liminary” and “interesting* can mean “unproved.” 

If the result is not reasonably conclusive, should there be further studies 
and what kind? 

How many subjects , patients , cases , or people are you talking about? 
Are these numbers large enough, statistically rigorous enough, to get the 
answers you want? Was there an adequate number of patients to show a 
difference between treatments? Why are you calling a press conference to 
report on four patients? 

Small numbers can sometimes carry weight. And they may 
sometimes be the only ones possible. “Sometimes small samples 
are the best we can do," one researcher says. But larger numbers 
are always more likely to pass statistical muster. 

The number studied can also depend on the subject. A 
thorough physiological study of five cases of some difficult disor¬ 
der may be important. One new case of smallpox would be a 
shocker in a world in which smallpox has supposedly been elimi¬ 
nated. In June 1981 the federal Centers for Disease Control 
reported that five young men, all active homosexuals, had been 
treated for Pneumocystis carinii pneumonia at three Los Angeles 
hospitals. 1 This alerted the world to what soon became the 
AIDS epidemic. 
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the treatment, (2) those getting it, and (3) those assessing the outcome know 
who was getting what, or were they indeed blinded, knowing only that they 
were comparing A and B (or A, B, and C, perhaps)? 

Could those giving or getting the treatment have easily guessed which 
was which by a difference in reaction or taste or other results? 

Not every study can be a blind study. One researcher says, 
TTliere can be ethical problems in not telling patients what drug 
they’re taking and the possible side effects. People are not guinea 
pigs." True enough, but a Winded study will always carry more 
conviction. 

Were there other accepted quality controls? For example, making 
sure (perhaps by counting pills or studying urine samples) that 
the patients supposed to take a pill really took it. 

Were you able to follow your protocol or study plan? 

If there were questionnaires, interviews, or a survey: Were 
the questions likely to elicit accurate, reliable answers? Was it really possible 
to gel accurate answers to these questions? 

Sampling is as common in medical studies as in political 
polling. Every study examines a sample, not the whole popula¬ 
tion. The sample must be reasonably accurate to give valid 
results. But badly worded questions can also distort the results. 
Respondents 9 answers can differ sharply, depending on how 
questions are asked. Example: In one study 1,153 subjects were 
asked which is safer, a treatment that kills 10 percent of every 
100 patients or a treatment with a 90 percent survival rate? 
More people voted for the second way of saying precisely the 
same thing.* 

People commonly give inaccurate answers to sensitive 
questions, such as those about sexual behavior. They are noto¬ 
riously inaccurate in reporting their own medical histories, even 
those of recent months. 

Ask: Did you pretest your questions for effectiveness bffore doing your 
actual survey? 

Also: What was your nonresponse rote? Do you report it? 
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Could your results have occurred just by chance? Have any statistical 
tests been applied to test this? 

Did you calculate a P value? Was it favorable—.05 or less? (Re¬ 
ported as < .05; see Chapter 3.) P values and confidence state¬ 
ments need not be regarded as straitjackets, but like jury ver¬ 
dicts, they indicate reasonable doubt or reasonable certainty. 

Remember that positive findings are more likely to be re¬ 
ported and published than negative findings. Remember that a 
favorable-sounding P value of < .05 means only that there is 
just 1 chance in 20, or a 5 percent probability, that the statistics 
could have come out this way by pure chance when there was 
actually no effect—so 1 in every 20 statistically significant results 
may be a misleading false positive. 

There are also ways and ways of arriving at P values. For 
example, an investigator may choose to report one of several end 
points: death, length of survival, blood pressure, other measure¬ 
ments, or just the patient’s condition on leaving the hospital. All 
can be important, but a P value can be misleading if the wrong 
one is picked or emphasized. 

You might want to ask: Are all the important end points and their 
P values reported? Also: Was the test giving the P value the appropriate 
test , as planned in your written protocol , or did you finally do more than 
one kind of test? (And perhaps report only the best answer?) What 
were the other values? 

Did you collaborate with a statistician in both your design and your 
analysis? A statistician’s collaboration often may be indicated in a 
credit or footnote. 

In studies seeking cause and effect, remember that association 
is not necessarily causation. Rutgers’ Dr. Michael Greenberg 
reminds us, “Mathematical methods cannot establish proof of 
cause and effect. They can indicate the probability that a rela¬ 
tionship occurred by chance, can sometimes quantify the exist¬ 
ing relationship between actions and effects, and can under the 
best circumstances be used to predict the impact erf* actions even 
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QUESTIONS REPORTERS CAN ASK 


if the complex phenomena driving them are not understood. 

. . . View mathematical associations with a healthy degree of 
skepticism.* 

A true experiment, controlling all variables, can sometimes 
prove cause and effect almost surely. This is easier in physics 
and chemistry than in human biology. When, then, does a dose 
association in an observational study (rather than a controlled 
experiment) indicate causation? There are several possible crite¬ 
ria that you can ask about: 

Is the association consistent? Are similar results usually found in 
different places and by different research methods? 

How strong is the association? If risk is an appropriate way of 
describing a particular situation: What is the relative risk , or the risk 
ratio? The word “strong' 1 is used here in its mathematical sense. 
It mainly means the magnitude of an effect or risk, the odds favor¬ 
ing the outcome of interest versus no such outcome. 

A relative risk, or risk ratio, compares two rates by dividing 
one by the other. In an American Cancer Society smoking study 
(see page 46) the lung cancer mortality rate in nonsmokers aged 
55 to 69 was 19 per 100,000 per year; the risk in smokers was 
1 188 per 100,000. Since 188 divided by 19 equals 9.89, the 
smokers were about 9.9 times more likely to die from lung 
cancer—their relative risk was 9.9.‘ That’s strong! 

Is there an impressive dose-response, or cause-and-effect, curve—a 
curve or gradient that shows that the greater the exposure to the 
agent, or cause, the greater the effect? Heavy smokers are in¬ 
deed at greater risk than moderate smokers, and moderate 
smokers at greater risk than light smokers. (In some cases—this 
is an unsettled matter— there may be a threshold effect, an effect 
only after some minimum dose.) 

Another way of asking about risk and response: What is the 
correlation coefficient—ike. extent to which a set of measurements of 
the association is linear? A perfect linear relationship, or correla¬ 
tion, between two observations or variables would show up as a 
straight, steadily rising set of data points— in everyday language, 
a straight line on a graph. A perfect positive correlation or. 
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linear relationship, is given the value +1; +.5 would be a lesser 
but still interesting relationship; — 1 or any negative figure indi¬ 
cates an inverse or negative relationship , such as a runner's speed 
going down as his weight goes up. A correlation of zero means 
no consistent association. 

How specific is the association? Does a supposed cause lead to 
many supposed effects? Or does an effect depend on many sup¬ 
posed causes? Such associations are less specific, and thus more 
suspect, until positive evidence piles up. Smoking indeed causes 
many effects. A lung disease, asbestosis, is most common when 
there is exposure to both asbestos and cigarette smoke. 

Does the supposed cause precede the effect? Is a supposed biological 
association epidemiobgicaUy plausible? One strong argument for a 
cause-and-effect relationship between high consumption of satu¬ 
rated fats and cholesterol and coronary heart disease is that 
populations on such diets generally develop more such disease 
than those on leaner diets. 

Does the association make biological sense? Does it agree with 
current biological and physiological knowledge? \bu can’t follow 
this test out the window. Much biological fact is ill understood. 
Also, Mosteller warns, “Someone nearly always will claim to see a 
[biological or physiological] association. But the people who 
know the most may not be willing to." 7 

Finally, look for the real why. Ask: Are there other possible 
explanations? Did you look fin other explanations—amfbunders, or con¬ 
founding variables , that may be producing or helping produce the 
association? Sometimes we read that married people live longer 
than singles. Does marriage really increase life span, or may 
medical or other problems make some people less likely to 
marry and also die sooner? Maybe the Dutch thought storks 
brought babies because better-off families had more chimneys, 
more storks, and more babies. 

Did you take steps to control or adjust fin other possible explanations? 
Did you do a stratified analysis —a breakdown of the data by strata 
like sex, race, socioeconomic status, geographical area, occupa¬ 
tion? Men commonly have more bronchitis and cirrhosis of the 
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point the same at onset? At diagnosis? At start oj treatment? Were they 
judged by the same disease definitions at the start and the same measures oj 
severity and outcome? 

Did the intervention have the good results that were intended? Has 
dure been an evaluation to see whether it was a useful result? 

Investigators often report that a drug or other measure has 
lowered blood cholesterol levels. Tine, but w tee they able to 
show that it reduced the number of heart attacks? Or was reduc¬ 
tion of a supposed risk factor itself taken to mean the hoped-for 
outcome? That may often be necessary, but the issue should be 
discussed. 

Investigators once reported that a new heart drug reduced 
the number of recurrent myocardial infarctions (heart attacks), 
fatal and nonfatal. But total mortality for all causes was higher 
in the treated group than in a placebo group. 

Public health officials may announce the success of a cam¬ 
paign to take high blood pressure measurements: X number of 
people were found to be hypertensive and were referred to their 
doctors. But how many went to their doctors? How many of 
those received optimum treatment? Were their blood pressures 
reduced? (If they were, the evidence is strong that they should 
suffer fewer strokes.) 

In short: What was the bottom tine? Did you reaUy do any good? 

75 whom do your results apply? Can they be generalized to a larger 
population? Are your patients tike the average doctor's patients? Is there any 
basis in these findings for any patient to ask his or her doctorfor a change in 
treatment? Clinic populations, hospital populations, and the 
“worst cases” are not necessarily typical of patients in general, 
and improper generalization is unfortunately common in the 
medical literature. 

Again and again, in many of the cases cited in this chapter, 
ask: Do other studies back you up? Art your results consistent with other 
clinical and experimental findings? Have your results been repeated or 
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own work’s importance. 1 ' But there are many exceptions. 

Ask others in the same field: How do other informed people 
regard this report—and these investigators? Are they speaking in their own 
area of expertise, or have they shown real mastery if they have ventured 
outside it? Have their past results generally held up? And what are some 
good questions 1 can ask them? True, a lot of brilliant and original 
work has been pooh-poohed for a time by others. Still, scientists 
survive only by eventually convincing their colleagues. 

More formally: Has there been a review of the data and conclusions 
by any disinterested parties? Some major clinical studies are re¬ 
viewed by independent second parties or committees. Reports 
of the National Academy of Sciences must pass muster by a 
review committee. 

Has then been peer review of the material? That is, has it been 
examined by referees who were sent the article by a journal 
editor? 

And, a very important question: Has the work been published 
or accepted by a reputable journal? If not, why not? The New England 
Journal of Medicine prints only 15 percent of the papers submitted 
to it (many, of course, are rejected because they are not of 
enough interest to the journal's readers). Many have been given 
at medical or scientific meetings, yet do not pass peer reviewers’ 
or the editors’ muster. Most are eventually published elsewhere, 
many in good journals. But there are journals and journals. 

In science as a whole, including biology and often basic 
medical sciences, Science and the British Nature are indispensable. 

In general medicine and clinical science at the physician’s level, 
the best, most useful journals are probably New England Journal 
of Medicine, Journal of the American Medical Association, Annals of 
Internal Medicine, Canadian Medical Journal, Journal of Clinical Inves¬ 
tigation, and the British Lancet and British Medical Journal . There 
are many equally good specialty journals as well as mediocre 
ones. In epidemiology, three good sources are American Journal of 
Epidemiology, Journal of Chronic Diseases, and Preventive Medicine . 

Ask people in any field: What are the most reliable journals, 
those where you would want your work published? 
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CHAPTER 


5 



QUESTIONS 


“Don’t assume that someone can interpret his own data. You 
may do better.” And “muddle around in the footnotes and ap¬ 
pendices” Mosteller advises. “You might find a few horrors. 
That’s how people found out that a much publicized study of 
public and private schools included only about 12 private, non- 
parochial schools.” 

• Other things described in this chapter, such as the proto¬ 
col and study design, the criteria for admitting and randomizing 
subjects, die therapy actually received (in contrast to that 
planned in the protocol), blinding, complications, loss to follow¬ 
up, follow-up time, and any discussion of reservations or 
weaknesses. 

Ask, when appropriate: Where did the money to support the study 
come from? Many honest investigators are financed by companies 
that may profit from the outcome. So are some dishonest or self- 
deluding investigators. But the peddler of a biased point of view 
is as likely to be an antiestablishment crusader—or an academic 
ladder-climber—as a corporate darling. Perhaps the best ques¬ 
tion to ask yourself is, Is this investigator a scientist or a sales¬ 
man? In any case, the public should know any pertinent con¬ 
nections. 

“What proportion of papers will satisfy [all] the require¬ 
ments for scientific proof and clinical applicability?” Sackett 
writes, “Not very many. . . . After all, there are only a handful 
of ways to do a study properly but a thousand ways to do it 
wrong!* 11 

Despite impeccable design, some studies yield answers that 
turn out to be wrong. Some fail for lack of understanding of 
physiology and disease. Even the soundest studies may provoke 
controversy. No study settles anything for all time. 

And according to Sackett, some “may meet considerable 
resistance when they discredit the only treatment currently 
available. . . . Clinicians may still elect to do something, even if 
it is of no demonstrable benefit. Study results may be rejected, 
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Tests and Testing 


6 


Testing is often the only way to answer our questions, but it doesn't produce 
unassailable, universal truths that should be carved on stone tablets. Instead, 
testing produces statistics, which must be interpreted. 

—Robert Hooke 


Who knows when thou mayest be tested? 


—Ronald Arthur Hopwood 


UO physicians always know what the/re doing when they 
administer tests? Stanford’s Dr. Eugene Robin says many tests 
“have not been properly evaluated and in fact may be useless or 
harmful” He asks, “Is it common practice in medicine to per¬ 
form careful clinical trials before introducing tests that can affect 
the welfare of masses of patients? Sadly, the answer is no." 1 

A good test should detect both health and disease and do so 
with high accuracy. The measures of the value of a clinical test, 
one used for medical diagnosis, are sensitivity and specificity , or, 
simply, the ability to avoid false negatives a nd false positives. Sensitiv¬ 
ity is how well a test identifies a disease or condition in those who 
have it—how well it avoids false negatives, or missed cases. If 100 
people with a condition are tested and 90 test positive, the test’s 
sensitivity is 90 percent. Specificity is how well a test identifies 
those who do not have the disease or condition—how well it rules 
out false positives, or mistaken identifications. If 100 healthy peo¬ 
ple are tested and 90 test negative, the test’s specificity is 90 
percent. 

Sensitivity, in short, tells us about disease present . Specificity tells 
us about disease absent !. A highly unspecific test will produce 
many false positives; a highly insensitive test, many false nega- 
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